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Abstract 

We study the localization of a cluster of activated vertices in a graph, from adaptively designed com- 
pressive measurements. We propose a hierarchical partitioning of the graph that groups the activated 
vertices into few partitions, so that a top-down sensing procedure can identify these partitions, and hence 
the activations, using few measurements. By exploiting the cluster structure, we are able to provide local- 
ization guarantees at weaker signal to noise ratios than in the unstructured setting. We complement this 
performance guarantee with an information theoretic lower bound, providing a necessary signal-to-noise 
ratio for any algorithm to successfully localize the cluster. We verify our analysis with some simulations, 
demonstrating the practicality of our algorithm. 

1 Introduction 

We are interested in recovering the support of a sparse vector x E W 1 observed through the noisy linear 
model: 

Hi = afx + Ei 

Where ~ Af(0, <J 2 ) and J2i \ \ a i\\ 2 < m - This support recovery problem is well known and fundamental 
to the theory of compressive sensing, which involves estimating a high-dimensional signal vector from few 
linear measurements [4|. Indeed if the non-zero components of x have magnitude > /i, it is now well known 
that one can recover supp(x) if ^ = log n) and one cannot if £ = o(yf^- logn) |[T2l . 

We build upon the classical results of compressive sensing by developing procedures that are adaptive 
and that exploit additional structure in the underlying signal. Adaptivity allows the procedure to focus mea- 
surements on activated components of the signal while structure can dramatically reduce the problem search 
space. Combined, both ideas can lead to significant performance improvements over classical compressed 
sensing. This paper explores the role of adaptivity and structure in a very general support recovery problem. 
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Setting 


Necessary 


Sufficient 


Passive, unstructured 
Adaptive, unstructured 
Adaptive, structured 




V% log n 1121 

v/^logfc© 
^log(plogn) (Prop.|3| 


Table 1: Compressed Sensing landscape. 



Graph 


Structure 


Necessary 


Sufficient 


2-d Lattice 
Rooted Tree 
Arbitrary 


Rectangle 
Rooted subtree 
Best case 




y^iogfcini 


iV^log((p + fc)logn) 



Table 2: Adaptive/Structured Sensing landscape. 



Active learning and adaptivity are by no means new ideas to the signal processing community and a 
number of papers in recent years have characterized, with upper and lower bounds, the advantages and 
limits of adaptive sensing over passive approaches 0161 19). One of the first ideas in this direction was 
distilled sensing Q, which uses direct rather than compressive measurements. Inspired by that work, a 
number of authors have studied adaptivity in compressive sensing and shown similar performance gains. 

The introduction of structure to the compressed sensing framework has also been explored by a number 
of authors ATI [3l l2l . Broadly speaking, these structural assumptions restrict the signal C to few of the (?) 
linear subspaces that contain fc-sparse signals. With this restrictions, one can often design sensing procedures 
that focus on these allowed subspaces and enjoy significant performance improvements over unstructured 
problems. We remark that both ifTTIl and Q develop adaptive sensing procedures for structured problems, 
but under more a more restrictive setting that this study. 

This paper continues in both of these directions, exploring the role of adaptivity and structure in recov- 
ering activated clusters in graphs. We consider localizing clusters whose boundary in the graph is smaller 
than some parameter p. This notion of structure is more general than previous studies, yet we are still able 
to demonstrate performance improvements over unstructured problems. 

Our study of cluster identification is motivated by a number of applications in sensor networks measure- 
ment and monitoring, including identification of viruses in human or computer network or contamination 
in body of water. In these settings, we expect the signal of interest to be localized, or clustered, in the 
underlying network, and we want to develop efficient procedures that exploit this cluster structure. 

In this paper, we propose two related adaptive sensing procedures for identifying a cluster of activations 
in a network. We give a sufficient condition on the SNR under which the first procedure exactly identifies 
the activated cluster. While this SNR is only slightly weaker than is sufficient for unstructured problems, we 
show, via an information theoretic lower bound, that one cannot hope for significantly better guarantees. 

For the second procedure, we perform a more refined analysis and show that the required SNR depends 
on how our algorithmic tool captures the cluster structure. In some cases this can lead to consistent recovery 
at much weaker SNR. The second procedure can also be adapted to recover a large fraction of the cluster. 
We also explore the performance of our procedures via an empirical study. Our results demonstrate the gains 
from exploiting both structure and adaptivity in support recovery problems. 

We put our results in context with the compressed sensing landscape in Tables [T] and [2] Here k is the 
cluster size and, for the structured setting, p denotes the cut size in the graph. Near-optimal procedures for 
passive and adaptive unstructured support recovery were analyzed in lfl2l and 10 respectively. Our work 
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provides both upper and lower bounds for the adaptive structured setting. Focusing on different notions 
of structure, Balakrishnan et. al. show that an SNR of | is necessary and sufficient for recovering a 
small square of activations in a grid |2|, while Soni and Haupt show that one can recover a tree-structured 

signal with an SNR of \J ^ log k ifTTl . Here, we study the general setting and our guarantee depends on 
how well the signal is captured by our algorithmic construction. In the best case, we can tolerate an SNR of 

2 Main Results 

Let C* denote a set of activated vertices in a known graph G — (V, E) on n nodes with maximal degree 
d. We observe C* through noisy compressed measurements of the vector x = pic*, that is we may select 
sensing vectors G K" and observe yi = a[ x + e; where ti ~ Af(0, a 2 ) independently. In total, we are 
given a sensing budget of to, meaning that we require J^. \ \cii\\ 2 < to. We allow for adaptivity, meaning 
that the procedure may use the measurements yi, . . . , to inform the choice of the subsequent vectors 
di. Our goal is to develop procedures that successfully recover C* in a low signal-to-noise ratio regime. 

We will require the set C*, which we will henceforth call a cluster, to have small cut-size in the graph 
G. Formally: 

C* G C p = {C : \{(u, v) : u G C, v <£ C}\ < p} 

Our algorithmic tool for identification of C* is a dendrogram T>, a hierarchical partitioning of G. For- 
mally, a dendrogram is a tree of blocks {D} where each block is a connected set of vertices in G. The root 
of T> is V, the set of all vertices, and the leaves of the dendrogram are all of the singletons {v}, v G G. The 
sets corresponding to the children of a block D should form a partition of the elements in D. In this sense, 
the dendrogram is similar to a hierarchical clustering of the vertices of G, preserving connectivity in each 
cluster. For now, we state the critical properties that we require of T>. We will see one way to construct such 
dendrograms in Section [Z3| 

Assumption 1. Let T> be a dendrogram for G. We assume that 

1. T> has degree at most d, the maximum degree in G. 

2. T> is approximately balanced. Specifically the child of any block D has size at most \D\/2. 

3. The height LofT) is at most log 2 (n). 

By the fact that each block of 2? is a connected set of vertices, we immediately have the following 
proposition: 

Proposition 2. For any C* in C p at most p clusters are impure at any level in T>. A block D is impure if 
<\DnC*\ <\D\. 

2.1 Universal Guarantees 

With a dendrogram T), we can sense with measurements of the form Id for each block D and dig down the 
hierarchy to identify the activated vertices. This procedure has the same flavor as the compressive binary 
search procedure [5 1. Specifically, fix a threshold r and energy parameter a and for each block D obtain the 
measurement 

y D = y/alpX + e D (1) 
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Algorithm 1 Exact Recovery 



Require: Dendrogram T) and sensing budget m, failure probability 8. 

set a = 3^W' T = *V l og(2dpL/6). 

(1) Let D be the root of V. 

(2) Obtain y D = ^/al^/xlc* + £d 

(3) If y D > (J,s/a\D\ - t add D to the estimate C. 

(4) If t < yjj < /^v^l-^l — T recurse on (2)-(4) with D's children. 
Output (7. 



If 

n <Vd < hV&\d\ - T t (2) 

continue sensing on D's children, otherwise terminate the recursion. See Algorithm [T] for a precise descrip- 
tion. At a fairly weak SNR and with appropriate setting of r and a, we can show that this procedure will 
exactly identify C*, a result we formalize below: 

Proposition 3. Set r = ay/\og{2dpL/5). If the SNR satisfies: 

with probability > 1 — 6, Algorithm^recove rs C* and uses a sensing budget of at most 3na log 2 p. 

We must set a appropriately to ensure we do not exceed our budget of m. With the correct setting, the 
SNR requirement becomes: 

a /24n , " I ' dpL\ 

Algorithm [T] performs similarly to the adaptive procedures for unstructured support recovery. For con- 
stant p, the SNR requirement is ^ = uj ( ^/^TogTog^~u) which is on the same order as the compressive 
binary search procedure [5 1 for recovering 1-sparse signals. For fc-sparse signals, the best results ]9l |6] 



require SNR of 




which can be much worse than our guarantee for large signals with small p. 



Thus, the procedure does enjoy small benefit from the structure in the problem, but the generality of 
our problem set up precludes more substantial performance gains for exact recovery. Indeed, we are able to 
show that one cannot do much better than Algorithm[T] This information theoretic lower bound is a simple 
consequence of the results from (TJ. 

Theorem 4. Fix any graph G and suppose p>d.If: 




then infjj sup c * gC Y[C ^ C*\ — > \ so that no procedure can reliably estimate C* € C p . 

The lower bound demonstrates one of the fundamental challenges in exploiting to structure in the cluster 
recovery problem: since C p is not parameterized by cluster size, one should not hope for performance 
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improvements that depend on cluster size or sparsity. More concretely, if p > d, the set C p contains all 
singleton vertices, reducing to a completely unstructured setting. Here, the results of [5| imply that to 
exactly recover a cluster of size one, it is necessary to have SNR of This is one argument for the lower 
bound. 

While our lower bound relies on singletons, they are not the only challenging facet of the problem. 
Another is the generality of the graph G. Indeed, nothing in our setup prevents G from being a complete 
graph on n vertices, in which case there is no structure, so one should not expect stronger results. 

The inherent difficulty of this problem is not only information theoretic, but also computational. The 
typical way to exploit structure is to scan across the possible signal patterns, using the fact that the the 
search space is highly restricted. This is the strategy in [2| where one must identify one of 0(n) possible 
patterns. In the cluster setting, Karger proved that the number of cuts of size p is on the order of 9(n p ) [8 1, 
meaning that restricting signals to C p does not significantly reduce the search space. 

Even if we could sweep across all cuts of size p, without further assumptions on G or C p there could be 
a number of clusters that disagree with C* on only a few vertices, and distinguishing between these would 
require high SNR. As a concrete example, if we are interested in localizing a contiguous chain of activations 

in the line graph, an adaptation of the lower bound in [2] shows that if ^ = o(max j.\J^^, \fh) tn en 
localization is impossible. The second term arises from the overlap between the contiguous blocks. It is 
independent of n, but also independent of k, showing that exploiting structure does not significantly help 
when distinguishing clusters that differ only by a few vertices. 

2.2 Cluster- Specific Guarantees 

The main performance bottleneck for Algorithm[TJcomes from testing whether a block D of size 1 is active 
or not. If there are no such singleton blocks, meaning that the cluster C* is grouped into large blocks 
in V, we might expect that Algorithm [TJ or a close variant can succeed at lower SNR. We formalize this 
approach in this section, giving an algorithm whose performance depends on how C* is partitioned across 
the dendrogram T>. 

We quantify this dependence with the notion of maximal blocks D £ T> which are the largest blocks that 
are completely active. Formally D is maximal if DC\C* = D and Ds parent is impure, and we denote this set 
Ai. If the maximal blocks are all large, then we can hope to obtain significant performance improvements. 

The algorithm consists of two phases. The first phase (the adaptive phase) is similar to Algorithm [TJ 
With a threshold z q , and energy parameter a, we sense on a block D with 

Vd = Vain* + tD 

If Vd > z q we sense on Z?'s children and we construct a pruned dendrogram JC of all blocks D, for which 
Vd > Zq- The pruned dendrogram is much smaller than T) but it retains a large fraction of C* . 

Since we have significantly reduced the dimensionality of the problem we can now use a passive local- 
ization procedure to identify C* at a low SNR. In the passive phase, we construct an orthonormal basis U 
for the subspace: 

{1 D : D e £} 

With another energy parameter (3, we observe yi — y/j3ufx. + e, for each basis vector ui and form the vector 
y = ^/]3U T x + e by stacking these observations. We then construct the vector x = Uy j ' \f]3. With the 
vector x we solve the following optimization problem to identify the cluster: 

C = argmax CcW l£x 



5 



Algorithm 2 Approximate Recovery 



Require: Dendrogram T>, sensing budget parameters a, f3. 
Set a, z q as in Theorem[5] 

(1) Let D be the root of V. 

(2) Obtain y D = ^J~a\ T D p\ c * + £r> 

(3) If yD > z q add D to JC and recurse on (l)-(3) with Ds children 
Construct U an orthonormal basis for span{l£> 

Sense y = \fj3lf 1 'pic* + e. 

Form x = Uy/\/]3 

Output C — argmax (7C [ n ] l^x. 



Setting 




One maximal block 
Uniform sizes 
Worst Case 


"(WsMAlogn)) 
w (Vs; lo g(fclogri)) 



Table 3: Instantiations of Theorem|5] 



which just amounts to taking the all of the positive coordinates of x. The full algorithm is described in 
Algorithm [2] For a more concise presentation, in the following results, we omit the dependence on the 
maximum degree, d. This localization guarantee is stated in terms of the symmetric set difference CAC* — 
(C\C*) U (C*\C). 

Theorem 5. Set z q so that P[Af(0, 1) > az q \ < then with probability > 1 — o(l): 

\CAC*\ = O g 0>+ *y Mn) + fclQg7iexp |_ Q | Mmin | 2 g! 

where M m - ln — argmin MeM \M\ and k = \C*\ and the budget is 0{j3{p + k)polylog(n) + a(n(\og(p + k) 
log log n)). In particular, with suitable choices for a and (3 if: 



^=J {P + k) P ^ l ° g{n) + r^Jl (log(p + k) + l0g( fc log „)) 



f/zen |CAC*| — > one/ f/;e budget is 0(m). 

The error decomposes into a estimation and approximation error terms and we should distribute the 
sensing budget to balance them. Note however that the energy for the adaptive phase is linear in n while the 
energy for the passive phase is only logarithmic in n, so the majority of the energy should be allocated to the 
first phase. The SNR requirement comes from allocating 0(m) energy to each phase, and the second term 
will usually dominate, particularly for small p and k, which is a regime of interest. Then, the required SNR 
is: 

~ =w ( n^\/-( lo g(p + fc ) + lo g( fcl °g™)) 

O VI M min| V m 

To more concretely interpret the result, we present sufficient SNR scalings for three scenarios in Table[3] 
We will think of p <C |C|. The most favorable realization is when there is only one maximal block and it is 
of size k. In this case, there is a significant gain in SNR over unstructured recovery or even Proposition [3] 
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Algorithm 3 FindBalance 

Require: T a subtree of G and initialize v € T arbitrarily 
loop 

Let T' be the component of T\{v} of largest size 
Let w be the unique neighbor of v in T". 
Let T" be the component of T\{ w} of largest size. 
Stop and return u if \T"\ > \T'\. 

v <— U). 

end loop 



Another interesting case is when the maximal blocks are all at the same level in the dendrogram. In this 
case, there can be at most pd maximal blocks since each of the parents must be impure and there can only be 
p impure blocks per level. If the maximal blocks are approximately the same size, then |A/ m i n | m k/ p, and 
we arrive at the requirement in the second row of Table [3] Again we see performance gains from structure, 
although there is some degradation. 

Unfortunately, since the bound depends on M m i n , we do not always realize such gains. It could be 
the case that M m ; n is a singleton block, in which case our bound deteriorates to the third row of Table [3] 
We remark that modulo log log factors, this matches the SNR scaling for the unstructured setting. It also 
nearly matches the lower bound in Theorem|4] Theorem [5] shows that the size of |M m j n | is the bottleneck to 
recovering C*. If we are willing to tolerate missing the small blocks we can sense at low SNR, although we 
are no longer guaranteed to consistently estimate C* . 

Corollary 6. Let C = U M eM,\M\>k M then: 

\CAC\ = O (^ (P + k) P olylo g (n) + fclognexp j_ afc2 ^j) 

In particular, we can recover all maximal blocks of size k with SNR on the order of which 
clearly shows the gain in exploiting structure in this problem. 

2.3 Constructing Dendrograms 

A general algorithm for constructing a dendrogram parallels the construction of spanning tree wavelets in 
iflOl . Given a spanning tree T for G, the root of the dendrogram is V, and the children are the subtrees 
around a balancing vertex v € T. The dendrogram is built recursively by identifying balancing vertices 
and using the subtrees as children. See Algorithm [4] for details. It is not hard to verify that a dendrogram 
constructed in this way satisfies Assumption[T] 

3 Experiments 

We conducted two simulation studies to verify our theoretical results and examine the performance of our 
algorithms empirically. The first experiment looks closely at Algorithm [T] showing that the SNR scaling 
in Proposition [3] agrees with our empirical observation. In the second experiment, we compare both of our 
algorithms with the algorithm from [6], which is an unstructured adaptive sensing procedure with state-of- 
the-art performance. 
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Algorithm 4 BuildDendrogram 



Require: T is a spanning tree of G. 
Initialize T> = {{v : v € T}}. 
Let v be the output of FindBalance applied to T. 

Let 7i , • • • , 7d„ be the connected component of T \ v and add v to the smallest component. 
Add {v : v € 7i} for each i as children of T to T> 
Recurse at (2) for each % as long as \%\ > 2. 



n=625,rho=12 
n=900,rho=20 
n=1600,rho=32 
n=2500,rho=40 




Rescaled Budget 



Figure 1 : Probability of success for Algorithmjljas a function of rescaled budget ( 
for the torus. 



nlog 2 plog 2 (plog(n)) 



In Figure [T] we plot the probability of successful recovery of C* as a function of a rescaled budget for 

a number of problem settings. The rescaled budget 9(n, m, p, ^) = ^ J wlog plo " ( p i ot r^ y was chosen so 

that the condition on the SNR in Proposition[3]is equivalent to 8 > c for some constant c. Proposition|3]then 
implies that with this rescaling, the probability of success curves should all line up, which is the phenomenon 
we observe in Figure[T] Here G is the two dimensional torus and V was constructed using Algorithm[4] 

In Figure|2]we plot the error, measured by \CAC*\, as a function of m for three algorithms in different 
problems settings. We use both Algorithms[T]and[2]as well as the sequentially designed compressed sensing 
algorithm [6|, which does not exploit structure, but has near-optimal performance for unstructured sparse 
recovery. We call that procedure SDC. Here G is the line graph, T> is the balanced binary dendrogram, and 
p = 2 so each signal is a contiguous block. 

In the top figure, k = 10 and since the maximal clusters are necessarily small, there should be little ben- 
efit from structure. Indeed, we see that all three algorithms perform similarly. This demonstrates that in the 
absence of structure, our procedures perform comparably to existing approaches for unstructured recovery. 
When k = 50 (the bottom figure), we see that both Algorithms [T] and [2] outperform SDC, particularly at low 
SNRs. Here, as predicted by our theory, Algorithm|2]can identify a large part of the cluster at very low SNR 
by exploiting the cluster structure. In fact AlgorithrnTllempirically performs well in this regime although we 
do not have theory to justify this. 



4 Conclusion 

We explore the role of structure and adaptivity in the support recovery problem, specifically in localizing a 
cluster of activations in a network. We show that when the cluster has small cut size, exploiting this structure 
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Figure 2: Error as a function of m for n = 512 and k = 10, 50 (top, bottom) demonstrating the gains from 
exploiting structure. Here G is a line graph and p = 2, resulting in one connected cluster. 



can result in performance improvements in terms of signal-to-noise ratios sufficient for cluster recovery. In 
a more cluster-specific guarantee, we show that if the true cluster C* coincides with a dendrogram over the 
graph, then recovery can be done at much weaker signal-to-noise ratios. These results do not contradict the 
lower bound for this problem, which shows that one cannot do much better than the unstructured setting. 

While our work contributes to our understanding of the role of structure in compressive sensing, our 
knowledge is still fairly limited. We now know of some very specific instances where structured signals can 
be localized at very weak SNR, but we do not have a full characterization of this effect. Our goal was to give 
such a precise characterization, but the generality of our set-up resulted in an information-theoretic barrier 
to demonstrating significant performance gains. An interesting direction for future research is to precisely 
quantify when structure can lead to improved sensing performance and to develop algorithms that enjoy 
these gains. 



A Proof of Theorem ID 

The proof is a simple extension of Theorem 2 from Davenport and Arias-Castro 0. In particular, if p > d 
then C p contains all one-sparse signals. Restricting to just these signals, the results from [5 1 imply that we 
cannot even detect if the activation is in the first or second half of the vertices unless - > \f^. This results 

a — V m 

in the lower bound. 

If we are also interested in introducing the cluster size parameter k we are can prove a similar lower 
bound by reduction to one-sparse testing. If p > kd then all (?) support patterns are in C p so we are again 
in the unstructured setting. Here, the results from [ 1 ] give the lower bound. 

If p < kd then we are in a structured setting in that not all (?) support patterns are possible. However, if 
we look at the cycle graph, each contiguous block contributes 2 to the cut size, so if p > 4 we are allowed at 
least two contiguous blocks. If fc— 1 of the activations lie in one contiguous block, then the last activation can 
be distributed in any of the n — k + 1 remaining vertices. Even if the localization procedure was provided 

with knowledge of the location of the k — 1 activations, an SNR of ^ > y ^ ,J ^ would be necessary for 
identifying the last activation. 
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B Proof of Proposition [3] 

Recall that for any block D that we sense, we obtain yo = y/al^plc* + £d- Consider a single block D, 
Gaussian tail bounds reveal the following facts: 

1. If D n C* = 0, then with probability >l-S,y D < a^2\og(l/8). 



2. ]fDDC* = D, then with probability > 1 - 8, y d > p^fa\D\ - oV21og(l/tf). 

3. Otherwise, with probability 1 — 8: p*Ja - a 2\og(l / 8) <vd< P\fa{\D\ - 1) + <r- v /21og(l/5). 
The above facts reveal that if: 

M /21og(l/£) 



cr V a 

then we will correctly identify if D is empty, full or impure. Assuming we perform this test correctly, we 
only refine D if it is impure, and Proposition |2]reveals that at most p clusters can be impure per level. For 
each of these p clusters that we refine, we search on at most d max clusters at the subsequent level. The total 
budget that we use is (recall that L is the height of T>): 

L / L-log 2 (pd) \ 

^amax{n,prf^} = a n\og 2 {pd) + ^ 2 iog^(pd) - a{n\og 2 {pd) + 2n) < ?>an\og 2 (pd) 
i=i \ i=o 2 J 

Setting a as in the Proposition makes this quantity smaller than to. Finally, we take a union bound over the 
pdL blocks that we sense on, and plug in our bound on a to arrive at the final rate of: 



p > _ /24nlog 2 ( / 9rf)log(pdL/ ( 5) 



(TV TO 

The thresholds t; and t^d are specified to ensure that that failure probability for all of the tests is at most 8. 

C Proof of Theorem H 

To prove Theorem[5]we must analyze each phase of the procedure. We first turn to the adaptive phase. By 
setting the threshold z q correctly, we retain a large fraction of C* while removing a large number of inactive 
nodes. We measure the fraction of C* lost by the projection onto the basis U for the subspace spanned by the 
blocks in K,. In the passive phase, we use the fact that \fC\ is small to bound the MSE of the reconstruction 
E| \x — x\ | 2 . We then show how to translate this MSE guarantee into an error guarantee for C. 

With all the results in the following sections we will be able to bound \CAC* | with probability > 1 — 38 

as: 

\6AC*\ < ^\\±-pl c >\\ 2 (4) 
M 

< £ ^ + ^cxp{-l/4a|M min |VA' 2 } (5) 

p z p o 

< ca 2 L 2 (3rd\og(rdL/S) + |C|) 2 | 

p 2 m 

+ ^ex P (-3/4" 1 ^^ 2l °g^ L ^ + ^jM^/A (7) 
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Here Equation |4] follows from our analysis of the optimization phase (Lemma 10 1, and Equation [5] follows 



from the bounds in Section |C.2| The last step follows by plugging in bounds on a and /3 if we want to 
allocate to/2 energy to each phase. Specifically the bound on a comes from Lemma [9] while the bound 
for P comes from Lemma [7] We obtain the final result by plugging in the bounds on L < [log 2 n\ and 
r < p[log 2 n\ . With these bounds, the first term is o(l) as long as: 



H ( pdlog 2 nlog(pdlog 2 n/5) + \C\ log 2 

— — bJ 



a 

The second term is o(l) when: 



cr \ \M min \ V m 



log 2 (pd 2 log 2 nlog(pdlog2 n/5) + \C\)log(\C\L/S) 



C.l The Adaptive Phase 

Our analysis will focus on recovering maximal blocks D E T>, which are the largest blocks that contain only 
activated vertices. Formally, D is maximal if D n C* = D and if D's parent contains some unactivated 
vertices. We are also interested in identifying impure blocks, which are partially overlap with C* . Suppose 
there are r such impure clusters. Let L denote the height of T). 

The first lemma helps us bound the number empty nodes that we retain: 

Lemma 7. Threshold at az q where P[A/"(0, 1) > z q ] < q and: 

y/5- 1 
q= ^d 

Then with probability > 1 — 6 the pruned dendrogram contains at most 3rd\og(rdL/5) + \C\ blocks per 
level for a total of at most L(3rdlog(rdL / 6) + |C|). 

Proof. For the first claim, we analyze the adaptive procedure on an empty dendrogram, showing that we 
retain no more than 3 log(L/<5) per level. The proof is by induction on the level /. Let the inductive hypoth- 
esis be that t; < 3 log(L/<5) where t\ is the number of nodes retained at the Ith level. Then by the Chernoff 
bound, 

P[ti-Ei,>e]<e*p{^-} 

Ei; can be bounded by dqti^i since each of the blocks that we retain at the I — 1st level can have at most d 
children and since we retain each block with probability q in expectation. With a union bound across all L 
levels, we have that with probability > 1 — 5: 

U < dqti-i + y/Sdqti-x log(L/<5) 
Applying the inductive hypothesis and the definition of q: 

ti < 3log(L/S)(dq+^/dq) < 31og(L/£) 

Thus for each empty dendrogram, we retain at most 3L \og(L/S). 

Each of the r impure clusters can spawn off at most d empty subtrees in the dendrogram. Taking a union 
bound over each of these rd empty subtrees shows that at most 3rdL \og(rdL/S) empty blocks are retained. 
There are at most \C\L active blocks, which gives us a bound on the size of JC. □ 
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Next we compute the probability that we fail to retain a maximal cluster: 
Lemma 8. For any maximal cluster M, the probability that M £ tCis bounded by: 

P[M iK]< Lexp{-l/2(y/a\M\fj,/a - z q ) 2 } 

as long as ^Ja\M\fj, > az q . 

Proof. We fail to retain a maximal cluster M if we throw away any of its ancestors in the dendrogram. All 
the ancestors of M have at least M activations so Eyo > fiy/a\M\ for each of M's ancestors. All yu have 
the same variance a 2 . By a union bound and Gaussian tail inequality the failure probability is at most: 

P[M iK]< LP[y M < oz q \ < Lexp{-l/2(y/a\M\fj,/a - z q ) 2 } 

□ 

To complete the adaptive phase, we must set a so that we use at most half of the budget. 
Lemma 9. The energy used in the adaptive phase is: 

a(3nlog 2 (4rd 2 \og{rdL/8) + |C|)) 



Proof. At level I we retain at most 3rd\og(rdL/5) empty blocks, so we sense on at most 3rd 2 \og(rdL / 5) + 
(d—l)p empty blocks (the at most p impure blocks could spawn off up to d — 1 empty ones). We also sense 
on at most p impure blocks and also sense every completely active block. In total we sense on no more: 

3rd 2 \og(rdL/S) + dp + \C\ < 4rd 2 \og(rdL/S) + \C\ 

blocks (p < r) at the I + 1st level. Since each block at the Zth level has size at most n/2 l we can bound the 
total energy as: 

L 

a V min{n, (4rd 2 \og{rdL/5) + |C|)^} 

< a (nlog 2 (4rd 2 log(rdL/6) + \C\) + 

< a (3nlog 2 (4rd 2 log{rdL/5) + |C|)) 

Here to arrive at the second line, we noticed that at the top levels, sensing on all of the blocks is a sharper 
bound than the one we computed which produces the first term. The second term comes from the fact that 
since we sense a constant number of blocks at each level, the budget is geometrically decreasing. □ 

In particular setting: 

771 

a = 6nlog 2 (4rd 2 log (rdL/S) + \C\) 
the budget for the adaptive phase is < to/2. 
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C.2 The Passive Phase 

In the passive phase, we need to compute two key quantities, (1) the energy of lc* that remains in the span 
of JC and (2) the estimation error of the projection that we perform. Recall that the space we are interested in 
is U = span{l]j}]j e ic and let U be a basis for this subspace. Let M. denote the maximal clusters retained 
in the adaptive phase while A4 denotes all of the maximal clusters. Throughout this section let x = file* ■ 
Since {^-m} mem * s a subspace of U we know that: 

MEM V V I I / MEM 

which means that (using Lemma [HJ: 

E||(I-7V)1 *|| 2 < E J2 \ M \ = WI M $ W 

M$K MEM 

< ^2 \M\Lexp{-l/2(y/^\M\^i/a ~ z q ) 2 } 

MEM 

< |C|Lexp{-l/2(yS|M min | M /a - z q ) 2 } 

Since q is a constant, z q is also constant. If fi/a > 2z <? /(|M m i n | v / a) (this will be dominated by other 
restrictions on the SNR) then this expression is bounded by: 

< |C|Lexp{-l/4a|M min |V/<7 2 } 
Applying Markov's inequality, we have that with probability > 1 — 5: 

\\(I-Vu)l c * || 2 < ffiex P {-l/4 a |M min |V 2 /^} 
Now we study the passive sampling scheme. If y = y / /3f/ T x + e where e ~ Af(0, a 2 I\k\) then: 

x = y/l/pUy = Vux + ^/l/pUe 

So that: 

||x-7W = ^lW = ^IMI 2 

where z ~ jV(0, a 2 I\/c\) is a |/C| -dimensional Gaussian vector. Concentration results for Gaussian vectors 
(or Chi-squared distributions) show that there is a constant c such that for n large enough ||;z|| 2 < ca 2 \JC\ 
with probability > 1 — 6. 

Putting these two bounds together gives us a high probability bound on the squared error (note that the 
cross term is zero since x g U): 

||x-x|| 2 < ||x-7Vx|| 2 + (/-||7V)x|| 2 

< ^ + ^exp { -l/4a|M min |V/a 2 } 
P o 
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C.3 Recovering C* 

The error guarantee of the optimization phase is based on the following lemma: 
Lemma 10. Let C denote the solution to: 



then: 



Proof. First note that: 



argmax C( - [n] k T lc 



|CAC*|<-i||x-x|| 2 
A 4 



\\/il d -x|| 2 = n 2 + ||x|| 2 -2fjat T l 6 < + ||x|| 2 -2^x T l c * = ||x-x|| 2 



Which tells us that: 



H 2 \CAC*\ = \\fil d - x|| 2 < 4||x - x|| 2 

□ 

C.4 Proof of Corollary g 

The proof of the corollary parallels that of the main theorem. In the adaptive phase, we instead show that with 
high probability we retain all clusters of size > k for some parameter k. Then since we are not interested in 
recovering the smaller clusters, we can safely ignore the energy in C* that is orthogonal to U. This means 
that the approximation error term from the previous proof can be ignored. 

Lemma 11. With probability > 1 — 5 we retain all clusters of size > k as long as: 



fj, 1/2, ,L\C\ S 



Proof. As in the proof of Lemma [8] we can proceed with a union bound. For a single block of size > k: 

P[M £K]< LP[y M < az q ] < £cxp{-(Vafc^/cr - z q ) 2 /2} 
There are at most \C\/k such maximal blocks so with a union bound, we arrive at the claim. □ 

The results from the adaptive phase show that all of the sufficiently large maximal clusters are retained 
in K. If we let C = \J M eM\\M\>k M then Wi 1 _ ^) 1 cl| 2 = with probability > 1 - S. Applying the 
results from passive phase, in particular Lemma[T0]we have: 

Icaci < 

Plugging for \JC\ using the same bound as before, and setting a as we did before gives the corollary. 
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